Goto

Collaborating Authors

 target position


Automated Construction of Artificial Lattice Structures with Designer Electronic States

Narasimha, Ganesh, Telychko, Mykola, Yang, Wooin, Baddorf, Arthur P., Ganesh, P., Li, An-Ping, Vasudevan, Rama

arXiv.org Artificial Intelligence

Manipulating matter with a scanning tunneling microscope (STM) enables creation of atomically defined artificial structures that host designer quantum states. However, the time-consuming nature of the manipulation process, coupled with the sensitivity of the STM tip, constrains the exploration of diverse configurations and limits the size of designed features. In this study, we present a reinforcement learning (RL)-based framework for creating artificial structures by spatially manipulating carbon monoxide (CO) molecules on a copper substrate using the STM tip. The automated workflow combines molecule detection and manipulation, employing deep learning-based object detection to locate CO molecules and linear assignment algorithms to allocate these molecules to designated target sites. We initially perform molecule maneuvering based on randomized parameter sampling for sample bias, tunneling current setpoint and manipulation speed. This dataset is then structured into an action trajectory used to train an RL agent. The model is subsequently deployed on the STM for real-time fine-tuning of manipulation parameters during structure construction. Our approach incorporates path planning protocols coupled with active drift compensation to enable atomically precise fabrication of structures with significantly reduced human input while realizing larger-scale artificial lattices with desired electronic properties. Using our approach, we demonstrate the automated construction of an extended artificial graphene lattice and confirm the existence of characteristic Dirac point in its electronic structure. Further challenges to RL-based structural assembly scalability are discussed.


Dynamic Reconfiguration of Robotic Swarms: Coordination and Control for Precise Shape Formation

Prasertying, Prab, Garcia, Paulo, Sritriratanarak, Warisa

arXiv.org Artificial Intelligence

Coordination of movement and configuration in robotic swarms is a challenging endeavor. Deciding when and where each individual robot must move is a computationally complex problem. The challenge is further exacerbated by difficulties inherent to physical systems, such as measurement error and control dynamics. Thus, how to best determine the optimal path for each robot, when moving from one configuration to another, and how to best perform such determination and effect corresponding motion remains an open problem. In this paper, we show an algorithm for such coordination of robotic swarms. Our methods allow seamless transition from one configuration to another, leveraging geometric formulations that are mapped to the physical domain through appropriate control, localization, and mapping techniques. This paves the way for novel applications of robotic swarms by enabling more sophisticated distributed behaviors.


SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Neural Information Processing Systems

Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network.


Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World

Jian, Yingzhao, Wang, Zhongan, Yang, Yi, Fan, Hehe

arXiv.org Artificial Intelligence

Humanoid agents often struggle to handle flexible and diverse interactions in open environments. A common solution is to collect massive datasets to train a highly capable model, but this approach can be prohibitively expensive. In this paper, we explore an alternative solution: empowering off-the-shelf Vision-Language Models (VLMs, such as GPT-4) to control humanoid agents, thereby leveraging their strong open-world generalization to mitigate the need for extensive data collection. To this end, we present \textbf{BiBo} (\textbf{B}uilding humano\textbf{I}d agent \textbf{B}y \textbf{O}ff-the-shelf VLMs). It consists of two key components: (1) an \textbf{embodied instruction compiler}, which enables the VLM to perceive the environment and precisely translate high-level user instructions (e.g., {\small\itshape ``have a rest''}) into low-level primitive commands with control parameters (e.g., {\small\itshape ``sit casually, location: (1, 2), facing: 90$^\circ$''}); and (2) a diffusion-based \textbf{motion executor}, which generates human-like motions from these commands, while dynamically adapting to physical feedback from the environment. In this way, BiBo is capable of handling not only basic interactions but also diverse and complex motions. Experiments demonstrate that BiBo achieves an interaction task success rate of 90.2\% in open environments, and improves the precision of text-guided motion execution by 16.3\% over prior methods. The code will be made publicly available.


Falconry-like palm landing by a flapping-wing drone based on the human gesture interaction and distance-aware flight planning

Numazato, Kazuki, Kan, Keiichiro, Kitagawa, Masaki, Li, Yunong, Kubel, Johannes, Zhao, Moju

arXiv.org Artificial Intelligence

Flapping-wing drones have attracted significant attention due to their biomimetic flight. They are considered more human-friendly due to their characteristics such as low noise and flexible wings, making them suitable for human-drone interactions. However, few studies have explored the practical interaction between humans and flapping-wing drones. On establishing a physical interaction system with flapping-wing drones, we can acquire inspirations from falconers who guide birds of prey to land on their arms. This interaction interprets the human body as a dynamic landing platform, which can be utilized in various scenarios such as crowded or spatially constrained environments. Thus, in this study, we propose a falconry-like interaction system in which a flapping-wing drone performs a palm landing motion on a human hand. To achieve a safe approach toward humans, we design a trajectory planning method that considers both physical and psychological factors of the human safety such as the drone's velocity and distance from the user. We use a commercial flapping platform with our implemented motion planning and conduct experiments to evaluate the palm landing performance and safety. The results demonstrate that our approach enables safe and smooth hand landing interactions. To the best of our knowledge, it is the first time to achieve a contact-based interaction between flapping-wing drones and humans.


HeLoM: Hierarchical Learning for Whole-Body Loco-Manipulation in Hexapod Robot

Yang, Xinrong, Li, Peizhuo, Li, Hongyi, Lu, Junkai, Chang, Linnan, Cao, Yuhong, Zhang, Yifeng, Sun, Ge, Sartoretti, Guillaume

arXiv.org Artificial Intelligence

Robots in real-world environments are often required to move/manipulate objects comparable in weight to their own bodies. Compared to grasping and carrying, pushing provides a more straightforward and efficient non-prehensile manipulation strategy, avoiding complex grasp design while leveraging direct contact to regulate an object's pose. Achieving effective pushing, however, demands both sufficient manipulation forces and the ability to maintain stability, which is particularly challenging when dealing with heavy or irregular objects. To address these challenges, we propose HeLoM, a learning-based hierarchical whole-body manipulation framework for a hexapod robot that exploits coordinated multi-limb control. Inspired by the cooperative strategies of multi-legged insects, our framework leverages redundant contact points and high degrees of freedom to enable dynamic redistribution of contact forces. HeLoM's high-level planner plans pushing behaviors and target object poses, while its low-level controller maintains locomotion stability and generates dynamically consistent joint actions. Our policies trained in simulation are directly deployed on real robots without additional fine-tuning. This design allows the robot to maintain balance while exerting continuous and controllable pushing forces through coordinated foreleg interaction and supportive hind-leg propulsion. We validate the effectiveness of HeLoM through both simulation and real-world experiments. Results show that our framework can stably push boxes of varying sizes and unknown physical properties to designated goal poses in the real world.


SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Neural Information Processing Systems

Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network.



Modular Design of Strict Control Lyapunov Functions for Global Stabilization of the Unicycle in Polar Coordinates

Todorovski, Velimir, Kim, Kwang Hak, Krstic, Miroslav

arXiv.org Artificial Intelligence

Abstract-- Since the mid-1990s, it has been known that, unlike in Cartesian form where Brockett's condition rules out static feedback stabilization, the unicycle is globally asymptotically stabilizable by smooth feedback in polar coordinates. In this note, we introduce a modular framework for designing smooth feedback laws that achieve global asymptotic stabilization in polar coordinates. These laws are bidirectional, enabling efficient parking maneuvers, and are paired with families of strict control Lyapunov functions (CLFs) constructed in a modular fashion. The resulting CLFs guarantee global asymptotic stability with explicit convergence rates and include barrier variants that yield "almost global" stabilization, excluding only zero-measure subsets of the rotation manifolds. The strictness of the CLFs is further leveraged in our companion paper, where we develop inverse-optimal redesigns with meaningful cost functions and infinite gain margins.


Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Deichler, Anna, Wang, Siyang, Alexanderson, Simon, Beskow, Jonas

arXiv.org Artificial Intelligence

One of the main goals of robotics and intelligent agent research is to enable natural communication with humans in physically situated settings. While recent work has focused on verbal modes such as language and speech, non-verbal communication is crucial for flexible interaction. We present a framework for generating pointing gestures in embodied agents by combining imitation and reinforcement learning. Using a small motion capture dataset, our method learns a motor control policy that produces physically valid, naturalistic gestures with high referential accuracy. We evaluate the approach against supervised learning and retrieval baselines in both objective metrics and a virtual reality referential game with human users. Results show that our system achieves higher naturalness and accuracy than state-of-the-art supervised models, highlighting the promise of imitation-RL for communicative gesture generation and its potential application to robots.